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Consider a connected network of n nodes that all wish to recover k desired packets. Each node 

begins with a subset of the desired packets and exchanges coded packets with its neighbors. This paper 

^^ provides necessary and sufficient conditions which characterize the set of all transmission schemes 

r; that permit every node to ultimately learn (recover) all k packets. When the network satisfies certain 

CZ3 regularity conditions and packets are randomly distributed, this paper provides tight concentration results 

'~~' on the number of transmissions required to achieve universal recovery. For the case of a fully connected 

'"^ network, a polynomial-time algorithm for computing an optimal transmission scheme is derived. An 
> 

lO application to secrecy generation is discussed. 
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I. Introduction 



> 

X 

c3 Consider a connected network of n nodes that all wish to recover k desired packets. Each 

node begins with a subset of the desired packets and broadcasts messages to its neighbors over 
discrete, memory less, and interference-free channels. Furthermore, every node knows which 
packets are already known by each node and knows the topology of the network. How many 
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transmissions are required to disseminate the k packets to every node in the network? How 
should this be accomplished? These are the essential questions addressed. We refer to this as the 
Coded Cooperative Data Exchange problem, or just the Cooperative Data Exchange problem. 

This work is motivated in part by emerging issues in distributed data storage. Consider the 
problem of backing up data on servers in a large data center. One commonly employed method 
to protect data from corruption is replication. Using this method, large quantities of data are 
replicated in several locations so as to protect from various sources of corruption (e.g., equipment 
failure, power outages, natural disasters, etc.). As the quantity of information in large data centers 
continues to increase, the number of file transfers required to complete a periodic replication task 
is becoming an increasingly important consideration due to time, equipment, cost, and energy 
constraints. The results contained in this paper address these issues. 

This model also has natural applications in the context of tactical networks, and we give 
one of them here. Consider a scenario in which an aircraft flies over a group of nodes on the 
ground and tries to deliver a video stream. Each ground node might only receive a subset of 
the transmitted packets due to interference, obstructions, and other signal integrity issues. In 
order to recover the transmission, the nodes are free to communicate with their neighbors, but 
would like to minimize the number of transmissions in order to conserve battery power (or avoid 
detection, etc.). How should the nodes share information, and what is the minimum number of 
transmissions required so that the entire network can recover the video stream? 

Beyond the examples mentioned above, the results presented herein can also be applied to 
practical secrecy generation amongst a collection of nodes. We consider this application in detail 
in Section |Wl 

A. Related Work 

Distributed data exchange problems have received a great deal of attention over the past 
several years. The powerful techniques afforded by network coding Q, ||5| have paved the way 
for cooperative communications at the packet-level. 

The coded cooperative data exchange problem (also called the universal recovery problem in 
|[T|-[|3|) was originally introduced by El Rouayheb et al. in [|6|, [|7) for a fully connected network 
(i.e., a single-hop network). For this special case, a randomized algorithm for finding an optimal 
transmission scheme was given in [[8J, and the first deterministic algorithm was recently given 



in [|9|. In the concluding remarks of [|9|, the weighted universal recovery problem (in which 
the objective is to minimize the weighted sum of transmissions by nodes) was posed as an 



open problem. However, this was solved using a variant of the same algorithm in [10|, and 
independently by the present authors using a submodular algorithm in [[3j|. 

The coded cooperative data exchange problem is related to the index coding problem originally 



introduced by Birk and Kol in [ 11 1. Specifically, generalizing the index coding problem to permit 
each node to be a transmitter (instead of having a single server) and further generalizing so that 
the network need not be a single hop network leads to a class of problems that includes our 
problem as a special case in which each node desires to receive all packets. 

One significant result in index coding is that nonlinear index coding outperforms the best 



linear index code in certain cases [12|, [13|. As discussed above, our problem is a special case 
of the generalized index coding problem, and it turns out that linear encoding does achieve the 
minimum number of transmissions required for universal recovery and this solution is computable 
in polynomial time for some important cases. 

This paper applies principles of cooperative data exchange to generate secrecy in the presence 
of an eavesdropper. In this context, the secrecy generation problem was originally studied in [14]. 
In p4| , Csiszar and Narayan gave single-letter characterizations of the secret-key and private- 
key capacities for a network of nodes connected by an error-free broadcast channel. While 
general and powerful, these results left two practical issues as open questions. First, (as with 
many information-theoretic investigations) the results require the nodes to observe arbitrarily 
long sequences of i.i.d. source symbols, which is generally not practical. Second, no efficient 
algorithm is provided in [ 14] which achieves the respective secrecy capacities. More recent work 



in [15 1, [|T6| addressed the latter point. 



B. Our Contributions 

In this paper, we provide necessary and sufficient conditions for achieving universal recoverjj^ 
in arbitrarily connected multihop networks. We specialize these necessary and sufficient condi- 
tions to obtain precise results in the case where the underlying network topology satisfies some 
modest regularity conditions. 

'in this paper, we use the term universal recovery to refer to the ultimate condition where every node has successfully 
recovered all packets. 



For the case of a fully connected network, we provide an algorithm based on submodular 
optimization which solves the cooperative data exchange problem. This algorithm is unique from 
the others previously appearing in the literature (cf. [(8|-pO|) in that it exploits submodularity. 
As a corollary, we provide exact concentration results when packets are randomly distributed in 
a network. 

In this same vein, we also obtain tight concentration results and approximate solutions when 
the underlying network is (i-regular and packets are distributed randomly. 

Furthermore, if packets are divisible (allowing transmissions to consist of partial packets), we 
prove that the traditional cut- set bounds can be achieved for any network topology. In the case 
of (i-regular and fully connected networks, we show that splitting packets does not typically 
provide any significant benefits. 



Finally, for the application to secrecy generation, we leverage the results of [ 14| in the context 
of the cooperative data exchange problem for a fully connected network. In doing so, we provide 
an efficient algorithm that achieves the secrecy capacity without requiring any quantities to grow 
asymptotically large. 

C. Organization 

This paper is organized as follows. Section [II] formally introduces the problem and provides 



basic definitions and notation. Section III presents our main results. Section nY\ discusses the 



application of our results to secrecy generation by a collection of nodes in the presence of an 



eavesdropper. Section |V] contains the relevant proofs. Section |VI] delivers the conclusions and 
discusses directions for future work. 

II. System Model and Definitions 

Before we formally introduce the problem, we establish some notation. Let N = 0, 1, 2, . . . 
denote the set of natural numbers. For two sets A and B, the relation A C B implies that A 
is a proper subset of B (i.e., A C B and A ^ B). For a set A, the corresponding power set is 
denoted 2^ := {B : B C A}. We use the notation [m] to denote the set {1, . . . , m}. 

This paper considers a network of n nodes. The network must be connected, but it need not 
be fully connected (i.e., it need not be a complete graph). A graph Q = (V, E) describes the 
specific connections in the network, where V is the set of vertices {vi : i G {1, . . . ,n}} (each 



corresponding to a node) and E is the set of edges connecting nodes. We assume that the edges 
in E are undirected, but our results can be extended to directed graphs. 

Each node wishes to recover the same k desired packets, and each node begins with a (possibly 
empty) subset of the desired packets. Formally, let Pj C {pi, . . . ,pf.} be the (indexed) set of 
packets originally available at node i, and {Pi}"=i satisfies IJILi -^« ~ i^i' • • • ^Pk}- Each pj E F, 
where F is some finite field (e.g. F = GF(2'")). For our purposes, it suffices to assume |F| > 2n. 
The set of packets initially missing at node i is denoted P^ := {pi, . . . ,pk}\Pi. 

Throughout this paper, we assume that each packet pi E {pi, . . . ,pk} is equally likely to be 
any element of F. Moreover, we assume that packets are independent of one another. Thus, 
no correlation between different packets or prior knowledge about unknown packets can be 
exploited. 

To simplify notation, we will refer to a given problem instance (i.e., a graph and corresponding 
sets of packets available at each node) as a network T = {Q, Pi, . . . , Pn}. When no ambiguity 
is present, we will refer to a network by T and omit the implicit dependence on the parameters 

{g,Pu...,Pn}. 

Let the set T{i) be the neighborhood of node i. There exists an edge e E E connecting two 
vertices Vi, Vj EViffiE T{j). For convenience, we put i E T(i). Node i sends (possibly coded) 
packets to its neighbors T{i) over discrete, memoryless, and interference-free channels. In other 
words, if node i transmits a message, then every node in T{i) receives that message. If 5 is a set 
of nodes, then we define r(S') = Uig5r(i). In a similar manner, we define d{S) = T(S)\S to 
be the boundary of the vertices in S. An example of sets S, T{S), and d{S) is given in Figure 

m 

This paper seeks to determine the minimum number of transmissions required to achieve 
universal recovery (when every node has learned all k packets). We primarily consider the case 
where packets are deemed indivisible. In this case, a single transmission by user i consists of 
sending a packet (some 2; G F) to all nodes j E T(i). This motivates the following definition. 

Definition 1: Given a network T, the minimum number of transmissions required to achieve 
universal recovery is denoted M*(T). 

To clarify this concept, we briefly consider two examples: 

Example 1 (Line Network): Suppose T is a network of nodes connected along a line as 
follows: V = {vi,V2,V3}, E = {{vi,V2),{v2,V3)}, Pi = {pi}, P2 = 0, and P3 = {^2}- Note 




Fig. 1. For the given graph, a set of vertices S and its neighborhood r(5') are depicted. The set d{S) (i.e., the boundary of 
S) consists of the four vertices in T(S) which are not in S. 



that each node must transmit at least once in order for all nodes to recover {pi,P2}, hence 
M*{T) > 3. Suppose node 1 transmits pi and node 3 transmits p2. Then (upon receipt of pi and 
P2 from nodes 1 and 3, respectively) node 2 transmits pi (Bp2, where © indicates addition in the 
finite field F. This strategy requires 3 transmissions and allows each user to recover {^1,^2}- 
Hence M*{r) = 3. 

Example [T] demonstrates a transmission schedule that uses two rounds of communication. 
The transmissions by node i in a particular round of communication can depend only on the 
information available to node i prior to that round (i.e. Pi and previously received transmissions 
from neighboring nodes). In other words, the transmissions are causal. The transmission scheme 
employed in Example [T] is illustrated in Figure [2j 

Example 2 (Fully Connected Network): Suppose T is a 3-node fully connected network in 
which ^ is a complete graph on 3 vertices, and Pi = {pi,P2,P3}\Pi- Clearly one transmission is 
not sufficient, thus M*{T) > 2. It can be seen that two transmissions suffice: let node 1 transmit 
P2 which lets node 2 have P2 Up2 = {Pi^P2^P'i]- Now, node 2 transmits pi ©Ps, allowing nodes 
1 and 3 to each recover all three packets. Thus M*(T) = 2. Since each transmission was only a 
function of the packets originally available at the corresponding node, this transmission strategy 
can be accomplished in a single round of communication. 

In the above examples, we notice that the transmission schemes are partially characterized by 
a schedule of which nodes transmit during which round of communication. We formalize this 
notion with the following definition: 
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Fig. 2. An illustration of the transmission scheme employed in Example [T] During the first time instant. Nodes 1 and 3 
broadcast packets pi and p2, respectively. During the second time instant, Node 2 broadcasts the XOR of packets pi and p2. 
This scheme requires three transmissions and achieves universal recovery. 



Definition 2 (Transmission Schedule): A set of integers {bl : i E [n],j E [r],bl E N} is called 
a transmission schedule for r rounds of communication if node i makes exactly b^ transmissions 
during communication round j. 

When the parameters n and r are clear from context, a transmission schedule will be denoted 
by the shorthand notation {bl}. Although finding a transmission schedule that achieves universal 
recovery is relatively easy (e.g., each node transmits all packets in their possession at each time 
instant), finding one that achieves universal recovery with M*{T) transmissions can be extremely 
difficult. This is demonstrated by the following example: 

Example 3 (Optimal Cooperative Data Exchange is NP-Hard.): Suppose T is a network with 
k = 1 corresponding to a bipartite graph with left and right vertex sets Vl and Vr respectively. 
Let Pi = pi for each i E Vl, and let Pi = $ for each i E Vr. In this case, M*{T) is given by 
the minimum number of sets in \V{i)}i^y^ which cover all vertices in Vr. Thus, finding M*(T) 
is at least as hard as the Minimum Set Cover problem, which is NP-complete [17|. 

Several of our results are stated in the context of randomly distributed packets. Assume < 
g < 1 is given. Our model is essentially that each packet is available independently at each node 




Fig. 3. An example of a sequence {So, Si, S2) G 5"' (5) for a particular choice of graph Q. 



with probability q. However, we must condition on the event that each packet is available to at 
least one node. Thus, when packets are randomly distributed, the underlying probability measure 
is given by 

l-(l-g)l^l 



Pr 



P^^ljP, 



j&s 



[i-qy 



(1) 



for all i E [k] and all nonempty S C V = [n]. 

Finally, we introduce one more definition which links the network topology with the number 
of communication rounds, r. 

Definition 3: For a graph Q = (V, E) on n vertices, define S^^'\Q) C (2^)'+^ as follows: 
(S'o, S'l, . . . , Sr) G S'^'^\Q) if and only if the sets {Si}l^Q satisfy the following two conditions: 

^ C Si CV for each < i < r, and 

Si-i C 5i C r(5i_i) for each 1 < i < r. 

In words, any element in S^'^\Q) is a nested sequence of subsets of vertices of Q. Moreover, the 
constraint that each set in the sequence is contained in its predecessor's neighborhood implies 
that the sets cannot expand too quickly relative to the topology of Q. 

To make the definition of S'^^^Q) more concrete, we have illustrated a sequence (S'o, Si, S2) G 
5^^^ (Q) for a particular choice of graph Q in Figure b1 



in. Main Results 
In this section, we present our main results. Proofs are delayed until Section |V| 



A. Necessary and Sufficient Conditions for Universal Recovery 

First, we provide necessary and sufficient conditions for achieving universal recovery in a 
network T. It turns out that these conditions are characterized by a particular set of transmission 
schedules lZr{T) which we define as follows: 

Definition 4: For a network T = {Q, Pi, ... , -P„}, define the region TZr{T) C l^"^^ to be the 
set of all transmission schedules {hi} satisfying: 



3=1 ies'fnrcsj-i 



{r+l-j) -> 






for each (5o, . . . , 5,) eS^'-\g). 



Theorem 1: For a network T, a transmission schedule {b^} permits universal recovery in r 
rounds of communication if and only if {bl} G lZr{T). 

Theorem [T] reveals that the set of transmission schedules permitting universal recovery is 
characterized precisely by the region TZriT). In fact, given a transmission schedule in lZr{T), a 
corresponding coding scheme that achieves universal recovery can be computed in polynomial 
time using the algorithm in [18| applied to the network coding graph discussed in the proof of 
Theorem [T] Alternatively, one could employ random linear network coding over a sufficiently 



large field size [ 19 1. If transmissions are made in a manner consistent with a schedule in lZr{T), 
universal recovery will be achieved with high probability. 

Thus, the problem of achieving universal recovery with the minimum number of transmissions 
reduces to solving a combinatorial optimization problem over lZr{T). As this problem was shown 
to be NP-hard in Example |3| we do not attempt to solve it in its most general form. Instead, we 
apply Theorem [T] to obtain surprisingly simple characterizations for several cases of interest. 

Before proceeding, we provide a quick example showing how the traditional cut-set bounds 
can be recovered from Theorem [U 

Example 4 (Cut-Set Bounds): Considering the constraint defining TZr{T) in which the nested 
subsets that form S^'^\g) are all identical. That is, {S, S, . . . ,S) G S^'^\g) for some nonempty 
S C V. We see that any transmission schedule {bl} G TZriT) must satisfy the familiar cut-set 
bounds: 



n^' 



ies 



(2) 



In words, the total number of packets that flow into the set of nodes S must be greater than or 
equal to the number of packets that the nodes in S are collectively missing. 
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B. Fully Connected Networks 

When T is a fully connected network, the graph ^ is a complete graph on n vertices. This is 
perhaps one of the most practically important cases to consider. For example, in a wired computer 
network, clients can multicast their messages to all other terminals which are cooperatively 
exchanging data. In wireless networks, broadcast is a natural transmission mode. Indeed, there 
are protocols tailored specifically to wireless networks which support reliable network-wide 



broadcast capabilities (cf. pO|-p3|). It is fortunate then, that the cooperative data exchange 
problem can be solved in polynomial time for fully connected networks: 

Theorem 2: For a fully connected network T, a transmission schedule requiring only M* (T) 
transmissions can be computed in polynomial time. Necessary and sufficient conditions for 
universal recovery in this case are given by the cut-set constraints (|2]). Moreover, a single round 
of communication is sufficient to achieve universal recovery with M*{T) transmissions. 

For the fully connected network in Example [2| we remarked that only one round of transmis- 
sion was required. Theorem |2] states that this trend extends to any fully connected network. 

An algorithm for solving the cooperative data exchange problem for fully connected networks 
is presented in Appendix |AJ We remark that the algorithm is sufficiently general that it can also 
solve the cooperative data exchange problem where the objective is to minimize the weighted 
sum of nodes' transmissions. 

Although Theorem |2] applies to arbitrary sets of packets Pi, ... , P„, it is insightful to consider 
the case where packets are randomly distributed in the network. In this case, the minimum 
number of transmissions required for universal recovery converges in probability to a simple 
function of the (random) sets Pi, . . . , P„. 

Theorem 3: If T is a fully connected network and packets are randomly distributed, then 



M*{T) 



1 " 



ra — _ 

i=l 



with probability approaching 1 as the number of packets A; — )• oo. 

C. d-Regular Networks 

Given that precise results can be obtained for fully connected networks, it is natural to ask 
whether these results can be extended to a larger class of networks which includes fully connected 
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networks as a special case. In this section, we partially answer this question in the affirmative. 
To this end, we define d-regular networks. 

Definition 5 (d-Regular Networks): A network T is said to be (i-regular if d{i) = d for each 
i E V and d{S) > d for each nonempty S C V with \S\ < n — d. In other words, a network T 
is d-regular if the associated graph Q is d-regular and rf- vertex-connected. 

Immediately, we see that the class of li-regular networks includes fully connected networks as 
a special case with d = n— I. Further, the class of (i-regular networks includes many frequently 
studied network topologies (e.g., cycles, grids on tori, etc.). 

Unfortunately, the deterministic algorithm of Theorem [2] does not appear to extend to (i-regular 
networks. However, a slightly weaker concentration result similar to Theorem [3] can be obtained 
when packets are randomly distributed. Before stating this result, consider the following Linear 
Program (LP) with variable vector x G M" defined for a network T: 



mmimize 

j=i 



$^x, (3) 



subject to: 2, ^« — \Pj\ f^^ esich. j G V. (4) 

iedij) 

Let Mlp(T) denote the optimal value of this LP. Interpreting Xi as J2jH^ the constraints in the 
LP are a subset of the cut-set constraints of (|2]) which are a subset of the necessary constraints 
for universal recovery given in Theorem [Tj Furthermore, the integer constraints on the Xj's are 
relaxed. Thus Mlp{T) certainly bounds M*(T) from below. Surprisingly, if T is a (i-regular 
network and the packets are randomly distributed, M* (T) is very close to this lower bound with 
high probability: 

Theorem 4: If T is a (i-regular network and the packets are randomly distributed, then 

M*{r) <MLp{T)+n 

with probability approaching 1 as the number of packets k —^ oo. 

We make two important observations. First, the length of the interval in which M*(T) is 
concentrated is independent of k. Hence, even though the number of packets k may be extremely 
large, M*(T) can be estimated accurately. Second, as k grows large, M*{T) is dominated by 
the local topology of T. This is readily seen since the constraints defining Mip(T) correspond 
only to nodes' immediate neighborhoods. The importance of the local neighborhood was also 
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seen in p4| where network coding capacity for certain random networks is shown to concentrate 



around the expected number of nearest neighbors of the source and the terminals. 

D. Large (Divisible) Packets 

We now return to general networks with arbitrarily distributed packets. However, we now 
consider the case where packets are "large" and can be divided into several smaller pieces (e.g., 
packets actually correspond to large files). To formalize this, assume that each packet can be 
partitioned into t chunks of equal size, and transmissions can consist of a single chunk (as 
opposed to an entire packet). In this case, we say the packets are i-divisible. To illustrate this 
point more clearly, we return to Example [2} this time considering 2-divisible packets. 

Example 5 (2-Divisible Packets): Let T be the network of Example [2] and split each packet 



into two halves: pi — )■ (pi \p\ '). Denote this new network T' with corresponding sets of packets: 






C2") (2) 

Three chunk transmissions allow universal recovery as follows: Node 1 transmits pg ® p\ ■ 
Node 2 transmits pi ® p)^ . Node 3 transmits pi'Qp^.lih readily verified from (|2) that 3 
chunk-transmissions are required to permit universal recovery. Thus, M*{T') = 3. Hence, if we 
were allowed to split the packets of Example |2] into two halves, it would suffice to transmit 3 
chunks. Normalizing the number of transmissions by the number of chunks per packet, we say 
that universal recovery can be achieved with 1.5 packet transmissions. 

Motivated by this example, define M^{T) to be the minimum number of (normalized) packet- 
transmissions required to achieve universal recovery in the network T when packets are t- 
divisible. For the network T in Example [2} we saw above that M2{T) = 1.5. 

It turns out, if packets are t-divisible and t is large, the cut-set bounds ([2]) are "nearly sufficient" 
for achieving universal recovery. To see this, let Mcut-set(^) be the optimal value of the LP: 



n 

minimize 

i=l 



J2xi (5) 



subject to: 2, ^« — 

iediS) 



f]pt 



(65 



for each nonempty S C V. (6) 



Clearly Mcut-set('^) ^ ^^0~) for ^^J network T with t-divisible packets because the LP 
producing MQyii.gQi{T) relaxes the integer constraints and is constrained only by (|2]) rather than 
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the full set of constraints given in Theorem [TJ However, there exist transmission schedules which 
can approach this lower bound. Stated more precisely: 

Theorem 5: For any network T, the minimum number of (normalized) packet-transmissions 
required to achieve universal recovery with t-divisible packets satisfies 

limM;(r)=Mcut-set(r). 

Precisely how large t is required to be in order to approach M^ut-setl^) within a specified 
tolerance is not clear for general networks. However, an immediate consequence of Theorem [3] 
is that t = n — I is sufficient to achieve this lower bound with high probability when packets 
are randomly distributed in a fully connected network. 

Finally, we remark that it is a simple exercise to construct examples where the cut-set bounds 
alone are not sufficient to characterize transmission schedules permitting universal recovery when 
packets are not divisible (e.g., a 4-node line network with packets pi and p2 at the left-most and 
right-most nodes, respectively). Thus, t-divisibility of packets provides the additional degrees of 
freedom necessary to approach the cut-set bounds more closely. 

E. Remarks 

One interesting consequence of our results is that splitting packets does not significantly 
reduce the required number of packet-transmissions for many scenarios. Indeed, at most one 
transmission can be saved if the network is fully connected (under any distribution of packets). 
If the network is (i-regular, we can expect to save fewer than n transmissions if packets are 
randomly distributed (in fact, at most one transmission per node). It seems possible that this 
result could be strengthened to include arbitrary distributions of packets in d-regular networks 
(as opposed to randomly distributed packets), but a proof has not been found. 

The limited value of dividing packets has practical ramifications since there is usually some 
additional communication overhead associated with dividing packets (e.g. additional headers, etc. 
for each transmitted chunk are required). Thus, if the packets are very large, say each packet is 
a video file, our results imply that entire coded packets can be transmitted without significant 
loss, avoiding any additional overhead incurred by dividing packets. 
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IV. An Application: Secrecy Generation 

In this section, we consider the setup of the cooperative data exchange problem for a fully 
connected network T, but we consider a different goal. In particular, we wish to generate a secret- 
key among the nodes that cannot be derived by an eavesdropper privy to all of the transmissions 
among nodes. Also, like the nodes themselves, the eavesdropper is assumed to know the indices 
of the packets initially available to each node. The goal is to generate the maximum amount of 
"secrecy" that cannot be determined by the eavesdropper. 

The theory behind secrecy generation among multiple terminals was originally established 
in [[T4| for a very general class of problems. Our results should be interpreted as a practical 
application of the theory originally developed in [ |14| . Indeed, our results and proofs are special 
cases of those in [ fT4l which have been streamlined to deal with the scenario under consideration. 
The aim of the present section is to show how secrecy can be generated in a practical scenario. In 
particular, we show that it is possible to efficiently generate the maximum amount of secrecy (as 



established in [ 14| ) among nodes in a fully connected network T = {G, Pi, ■ ■ ■ , Pn}- Moreover, 
we show that this is possible in the non- asymptotic regime (i.e., there are no e's and we don't 
require the number of packets or nodes to grow arbitrarily large). Finally, we note that it is 
possible to generate perfect secrecy instead of e-secrecy without any sacrifice. 

A. Practical Secrecy Results 

In this subsection, we state two results on secrecy generation. Proofs are again postponed until 
Section |vj We begin with some definitions^ Let F denote the set of all transmissions (all of 
which are available to the eavesdropper by definition). A function K of the packets {pi, . . . ,pk} 
in the network is called a secret key (SK) if K is recoverable by all nodes after observing F, 
and it satisfies the (perfect) secrecy condition 

I{K;F) = 0, (7) 

and the uniformity condition 

Pr (K = key) = -— for all key E /C, (8) 



We attempt to follow the notation of |14| where appropriate. 
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where /C is the alphabet of possible keys. 

We define CsK'iPi, ■ ■ ■ , Pn) to be the secret-key capacity for a particular distribution of 
packets. We will drop the notational dependence on Pi, . . . , P„ where it doesn't cause confusion. 
By this we mean that a secret-key K can be generated if and only if /C = W^'^'^ . In other words, 
the nodes can generate at most Csk packets worth of secret-key. Our first result of this section 
is the following: 

Theorem 6: The secret-key capacity is given by: Csk{Pi-, • • • , P-n) = k — M*{T). 

Next, consider the related problem where a subset D C V of nodes is compromised. In this 
problem, the eavesdropper has access to F and Pi for i E D. In this case, the secret-key should 
also be kept hidden from the nodes in D (or else the eavesdropper could also recover it). Thus, 
for a subset of nodes D, let Pd = IJieD -^«' ^^^ ^^^^ ^ ^ private-key (PK) if it is a secret-key 
which is only recoverable by the nodes in V\D, and also satisfies the stronger secrecy condition: 

I{K;F,Pn) = 0. (9) 

Similar to above, define Cpk{Pi, ■ ■ ■ , Pn, D) to be the private -key capacity for a particular 
distribution of packets and subset of nodes D. Again, we mean that a private-key K can be 
generated if and only if /C = F*"^^^'. In other words, the nodes in V\D can generate at most 
CpK packets worth of private-key. Note that, since Pd is known to the eavesdropper, each node 
i E D can transmit its respective set of packets Pi without any loss of secrecy capacity. 

Define a new network To = {GD,{Pi }i€V\D} as follows. Let Qd be the complete graph 
on V\D, and let P/ = Pi\PD for each i E V\D. Thus, Tb is a fully connected network with 
n — \D\ nodes and k — |Pd| packets. Our second result of this section is the following: 

Theorem 7: The private-key capacity is given by: 

Cpk{Pi, ...,P^,D) = {k- \Pd\) - M*{Td). 

The basic idea for private -key generation is that the users in V\D should generate a secret-key 
from {pi,...,pk}\PD- 

By the definitions of the SK and PK capacities. Theorem [2] implies that it is possible to 
compute these capacities efficiently. Moreover, as we will see in the achievability proofs, these 
capacities can be achieved by performing coded cooperative data exchange amongst the nodes. 
Thus, the algorithm developed in Appendix |A] combined with the algorithm in [ |18J can be 
employed to efficiently solve the secrecy generation problem we consider. 
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We conclude this subsection with an example to illustrate the results. 

Example 6: Consider again the network of Example |2] and assume F = {0,1} (i.e., each 
packet is a single bit). The secret-key capacity for this network is 1 bit. After performing 
universal recovery, the eavesdropper knows p2 and the parity pi © ^3. A perfect secret-key is 
K = pi (we could alternatively use K = p^). If any of the nodes are compromised by the 
eavesdropper, the private-key capacity is 0. 

We remark that the secret-key in the above example can in fact be attained by all nodes using 
only one transmission (i.e., universal recovery is not a prerequisite for secret-key generation). 
However, it remains true that only one bit of secrecy can be generated. 

V. Proofs of Main Results 

A. Necessary and Sufficient Conditions for Universal Recovery 

Proof of Theorem [7|- This proof is accomplished by reducing the problem at hand to 
an instance of a single-source network coding problem and invoking the Max-Flow Min-Cut 
Theorem for network information flow [4|. 

First, fix the number of communication rounds r to be large enough to permit universal 
recovery. For a network T, construct the network-coding graph Q^'^ = {Vnc, E^c) as follows. 
The vertex set, Vnc is defined as: 

r r 

Vnc = {s,Mi,...,Mfc}U [j{vl...,vi}U [j{w{,...,wi}. 

j=0 j=l 

The edge set, E^c, consists of directed edges and is constructed as follows: 

• For each i G [k], there is an edge of unit capacitjjj from s to ui. 

• If Pi G Pj, then there is an edge of infinite capacity from Ui to f °. 



3-1 

i 

i-1 



• For each j G [r] and each i E [n], there is an edge of infinite capacity from vl~ to v^. 

• For each j G [r] and each i E [n], there is an edge of capacity bl from t>^~ to wj. 

• For each j G [r] and each i E [n], there is an edge of infinite capacity from wf to vj, iff 
i' E T{t). 

The interpretation of this graph is as follows: the vertex Ui is introduced to represent packet 
Pi, the vertex vj represents node i after the j*'' round of communication, and the vertex wj 

^An edge of unit capacity can carry one field element 2 G F per unit time. 
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Fig. 4. The graph Q^ corresponding to the line network of Example 111 Edges represented by broken lines have infinite 
capacity. Edges with finite capacities are labeled with the corresponding capacity value. 



represents the broadcast of node i during the j*^ round of communication. If the 6-^'s are chosen 
such that the graph Q^'-^ admits a network coding solution which supports a multicast of k units 
from s to {f[, . . . ,f^}, then this network coding solution also solves the universal recovery 
problem for the network T when node i is allowed to make at most bj transmissions during the 
j^^ round of communication. The graph Q^'^ corresponding to the line network of Example [I] 
is given in Figure |4j 

We now formally prove the equivalence of the network coding problem on Q^'^ and the 
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universal recovery problem defined by T. 

Suppose a set of encoding functions {//} and a set decoding functions {0,} describe a 
transmission strategy which solves the universal recovery problem for a network T in r rounds 
of communication. Let 6^ be the number of transmissions made by node i during the f^ round 
of communication, and let X/ be all the information known to node i prior to the j*'^ round of 
communication (e.g. X} = Pi). The function f- is the encoding function for user i during the 
jth j-Qund of communication (i.e. fiiXl) G F^»), and the decoding functions satisfy: 

Note that, given the encoding functions and the Pj's, the X/'s can be defined recursively as: 

i'er(i) 

The functions {//} and {0j} can be used to generate a network coding solution which supports 
k units of flow from s to {i;[, . . . ,f^} on Q^'-^ as follows: 

For each vertex v E Vnc, let lN{v) be whatever v receives on its incoming edges. Let g^ 
be the encoding function at vertex v, and gy{e,lN{v)) be the encoded message which vertex v 
sends along e (e is an outgoing edge from v). 

If e is an edge of infinite capacity emanating from v, let g^{e,lN(v)) = JN(v). 

Let s send Pi along edge (s, Ui). At this point, we have IN(t>°) = Pi = X}. For each i E [n], let 
g^o{{Vi,wl),lN{v^)) = fl{Zl). By a simple inductive argument, defining the encoding functions 
g^divl wt'),lN{vi)) to be equal to //+^ yields the resuk that IN«) = (X[, U,.er«{/;(Xp)}) . 
Hence, the decoding function (pi can be used at v^ to allow error-free reconstruction of the /c-unit 
flow. 

The equivalence argument is completed by showing that a network coding solution which 
supports a A;-unit multicast flow from s to {v^, . . . , t>^} on Q^'^ also solves the universal recovery 
problem on T. This is argued in a similar manner as above, and is therefore omitted. 

Since we have shown that the universal recovery problem on T is equivalent to a network 
coding problem on ^^*", the celebrated max-flow min-cut result of Ahlswede et. al [Q is 
applicable. In particular, a fixed vector {6]} admits a solution to the universal recovery problem 
where node i makes at most 6] transmissions during the j*'^ round of communication if and only 
if any cut separating s from some v^ in Q^'^ has capacity at least k. 
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What remains to be shown is that the inequalities defining TZr(T) are satisfied if and only if 
any cut separating s from some f[ in Q^'^ has capacity at least k. 

To this end, suppose we have a cut (S, S^) satisfying s E S^ and t'[ G S for some i E [n]. 
We will modify the cut (S, S^) to produce a new cut (S", S"") with capacity less than or equal 
to the capacity of the original cut (S, 5^). 

Define the set So C [n] as follows: i G 5*0 iff f[ G 5" (by definition of S, we have that 5*0 7^ 0). 
Initially, let S' = S. Modify the cut {S', S'^) as follows: 
Ml) If i E r(5o), then place < into S'. 
M2) If i i r(5o), then place w\ into 5"=. 

Modifications ]V(T] and ]V|2] are justified (respectively) by J[T] and J[2| 
Jl) If 2 G r(S'o), then there exists an edge of infinite capacity from w\ to some v[i E S. Thus, 

moving w^ to S' (if necessary) does not increase the capacity of the cut. 
J2) If i ^ r(5'o), then there are no edges from w[ to S, hence we can move wl into S"^ (if 

necessary) without increasing the capacity of the cut. 
Modifications ]y(T] and ]y|2] guarantee that w[ G S' iff i E T{So). Thus, assume that {S',S"') 
satisfies this condition and further modify the cut as follows: 
M3) If i E So, then place v^'^ into 5". 
M4) If i ^ T{So), then place v^-^ into S'". 

Modifications ]y(3] and ]y(4] are justified (respectively) by J|3] and J|4| 
J3) If i G So, then there exists an edge of infinite capacity from t>[~^ to f[ G S. Thus, moving 

f[~^ to S' (if necessary) does not increase the capacity of the cut. 
J4) If i ^ r(5'o), then there are no edges from t>[~^ to S' (since w[ ^ S' by assumption), 
hence we can move f[^^ into S'^ (if necessary) without increasing the capacity of the cut. 
At this point, define the set Si C [n] as follows: i E Si iff t;["^ G S'. Note that the 
modifications of S' guarantee that Si satisfies So '^ Si C T(So). 

This procedure can be repeated for each layer of the graph resulting in a sequence of sets 
C 5o C . . . C S^ C [n] satisfying Sj C T{Sj^i) for each j E [r]. 

We now perform a final modification of the cut (S", S'^): 
M5) If pj E Ujg5^Pj, then place Uj into S'. 
M6) If Pj ^ Uie5^Pi, then place Uj into S*"". 
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Modifications ]y(5] and ]y(6] are justified (respectively) by J|5] and J[6| 
J5) If pj E Ujg5^Pj, then there is an edge of infinite capacity from Uj to S' and moving Uj 

into S' (if necessary) does not increase the capacity of the cut. 
J6) If Pj ^ Uig5^Pj, then there are no edges from Uj to S', hence moving Uj (if necessary) 

into S"^ cannot increase the capacity of the cut. 
A quick calculation shows that the modified cut (S", S'^) has capacity greater than or equal 
to k iff: 



i=i i65^nr{5,_i) 



JG5r 



(10) 



Since every modification of the cut either preserved or reduced the capacity of the cut, the 
original cut (S, S^) also has capacity greater than or equal to k if the above inequality is satisfied. 
In Figure [5} we illustrate a cut {S, S'^) and its modified minimal cut (5", S"^) for the graph Q^^ 
corresponding to the line network of Example [1] 

By the equivalence of the universal recovery problem on a network T to the network coding 
problem on Q^'-^ and the max-flow min-cut theorem for network information flow, if a transmis- 
sion scheme solves the universal recovery problem on T, then the associated 6^'s must satisfy the 



constraints of the form given by (10). Conversely, for any set of 6^'s which satisfy the constraints 



of the form given by (fTO]), there exists a transmission scheme using exactly those numbers of 



transmissions which solves the universal recovery problem for T. Thus the constraints of (10), 
and hence the inequalities defining lZr{T), are satisfied if and only if any cut separating s from 
some vl in Q'^'-^ has capacity at least k. 



Remark 1: Since 



fliefni ^i ~ 0' constraints where 5*^ = [n] are trivially satisfied. Therefore, 



we can restrict our attention to sequences of sets where Sr C [n]. 



B. Fully Connected Networks 

Proof of Theorem |2j- In the case where T is a fully connected network, we have that 
5'^nr(S'j_i) = S*^ for any nonempty S (ZV . Therefore, the constraints defining TZr{T) become: 



EE ";"'-> 



n^' 



i&Sr 



(11) 
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Fig. 5. The graph Q^ corresponding to the line network of Example 111 with original cut {S, S'^) and the corresponding 
modified minimal cut {S' , S"^). In this case, So = Si = S2 = {!}■ Upon substitution into ^T0[, this choice of So, Si, S2 yields 
the inequality bl + b2 > 1. 



Now, suppose a transmission schedule {6^} G lZr{T) and consider the modified transmission 
schedule {6^} defined by: 6[ = Yl^.^iK ^'^d K = ^ fo^ J < ^- By construction, S'j^_|_^ C S'j in 
the constraints defining TZr{T). Therefore, using the definition of {hi}, we have: 



E'-i^EEr-' 



> 



ie5f 



i=i ie5| 



n^ 



ieSr 



Thus the modified transmission schedule is also in lZr{T). Since [flieSi-^i'^l — IflieSr -^«'^l' 
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when T is a fully connected network, it is sufficient to consider constraints of the form: 



E"; 



> 



ie5= 



ies 



for all nonempty S C V. 



(12) 



This proves the latter two statements of the theorem: that the cut-set constraints are necessary 
and sufficient for universal recovery when T is a fully connected network, and that a single 
round of communication is sufficient to achieve universal recovery with M* (T) transmissions. 
With these results established, an optimal transmission schedule can be obtained by solving 
the following integer linear program: 



minimize 



E"' 



(13) 



j=l 



subject to: y. ^« — 



i&S'^ 






for each nonempty S dV . 



In order to accomplish this, we identify Bi ^— P/ and set Wi = 1 fox i E [n] and apply the 
submodular algorithm presented in Appendix |A} ■ 

Now we consider fully connected networks in which packets are randomly distributed ac- 
cording to ([T]), which is parametrized by q. The proof of Theorem |3] requires the following 
lemma: 

Lemma i; If < g < 1 is fixed, then there exists some 5 > such that the following 
inequality holds for all £ G {2, . . . , n — 1}: 

n-l {l-qY-H-qY , ^ 

-> ; ^ r 0. 

n-l ~ I- q-{l- qY 

Proof: Applying Jensen's inequality to the strictly convex function f{x) = (1 — q)'-^ using 
the convex combination i = 0-l + {l — 6)-n yields: 

1 -g- (1 -g)" n-l 
Taking 5 to be the minimum gap in the above inequality for the values £ G {2, 
completes the proof. 

Proof of Theorem |ij- We begin by showing that the LP 



n-l} 



minimize 



E"' 



j=i 



subject to: 2_\ ^i — 



ii^S'^ 



ies 



for each nonempty S C V. 



(14) 
(15) 
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has an optimal value of ^^ Z]r=i l-^i^^l "^^^^ ^^S^ probability. To this end, note that the inequalities 



J2^^- l^/l ^°^^ <J<n. (16) 

i=l 

are a subset of the inequality constraints ( fTSj ). Summing both sides of ([16]) over I < j < n 

(17) 



reveals that any feasible vector 6 G M" for LP (|T4|)-((T5]) must satisfy: 

n _ n 



i=l 



i=\ 



This establishes a lower bound on the optimal value of the LP. We now identify a solution that 



is feasible with probability approaching 1 as A; — )■ oo while achieving the lower bound of ( [17] ) 
with equality. To begin note that 



h 



n 



n 
i=l 



I 3 



(18) 



is a solution to the system of linear equations given by (16) and achieves ([17]) with equality 



Now, we prove that (&i, . . . ,6„) is a feasible solution to LP ( [14] ) with high probability. To be 
specific, we must verify that 






> 



ies 



(19) 



holds with high probability for all subsets S C V satisfying 2 < l^l < n — 1 (the case l^l = 1 



is satisfied by the definition of {bi}l'^^). Substitution of (18) into (19) along with some algebra 
yields that the following equivalent conditions must hold: 



n-\S\ 



n 






pn-Y.\\p^\> 



ieS'= 



k 



f]Pt 



ies 



(20) 



To this end, note that for any S, | flies -^^"^ I ^^ ^ random variable which can be expressed 
as I flies -^i'^ I ~ X]}=i^/' where Xj is an indicator random variable taking the value 1 if 
Pj G flies ^i ^^'^ ^ otherwise. From ([T]) we have: 

;i -g)l'5l - (1-g)" 



Pr (X/ = 1) 



1-(1 
By the weak law of large numbers, for any t] > 0: 



qy 



Pr 






(l-g)l^l-(l-g)" 

i-(i-g)" 



>V] < e/c, 



(21) 
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where e^ — )■ as A; ^^ oo. Thus, by the union bound, Lemma [T| and taking r] sufficiently small, 
the following string of inequalities holds with arbitrarily high probability as A; — t- oo: 

\S\ 



n 



n — 



1 i Z^ A; '^ ' ^ A: '^ ' 



1=1 



> 



n — 15*1 



n 



:i - g) - (1 - qy 

l-(l-g)" 



{2n - l)r7 



> ^ ^^- V ' +7] 



[l-qY 



> 



k 






These steps are justified as follows: for i] sufficiently small the first and last inequalities hold 
with high probability by ( [2T] ), and the second inequality follows from Lemma [T] with i = \S\. 
This proves that ([20]) holds, and therefore (6i, 



,bn) is a feasible solution to LP ( fT4| ) with 
high probability. Now, taking Corollary [T] in Appendix |A] together with Theorem |2] completes 
the proof. ■ 



C. d-Regular Networks 

Lemma 2: Assume packets are randomly distributed in a rf-regular network T. For any e > 0, 
there exists an optimal solution x* to LP ([3]j4]) which satisfies 

. 1, 



X 



d 



E[|pri]i 



< ek 



with probability approaching 1 as A; — )• oo, where E indicates expectation. 

Proof: Let P = (\P^\, . . . ,\P^\Y and let A be the adjacency matrix of Q (i.e., Oj j = 1 if 
(z,j) G E and otherwise). Observe that A is symmetric and At. = dl, where 1 denotes a 
column vector of I's. With this notation, LP ([3]) can be rewritten as: 

minimize l^x (22) 

subject to: Ax >: P, 

where "a y h" for vectors a,h eW means that a, > hi for i = 1, . . . , n. 
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Let A+ denote the Moore-Penrose pseudoinverse of A. Observe that the hnear least squares 
solution to Ax ^ P is given by: 

XLS = A^P 

= A+EP +A+(p-EP' 

^EP + A+ (P-EP) . 



d 

For the last step above, note that EP is an eigenvector of A with eigenvalue d so EP will also 
be an eigenvector of A+ with eigenvalue ^. Hence, 

\\XLS - 2^Ph 



= p+ (P-EP 

< IIA+II2IIP-EPII2. 



Combining this with the triangle inequality implies that, for any vector y. 



|y-^EP| 



00 < ||y-^Ls||oo 

< \\y-XLs\\oo 

< ||y-^Ls||oo 



\xls - 3IE-PII00 
d 

\XLS-I^P\\2 

d 

U+II2IIP-EPII2. 



Therefore, Lemma |7] (see Appendix |B]) guarantees the existence of an optimal solution x* to 
LP ( [22I ) (and consequently LP ([3])) which satisfies: 

||a;* - ^EP|U < ||a;* - xlsWoo + \\A+h\\P - Epy 

< caWAxls - Ph + WA-^hWP - EPII2 

< ca\\-aeP - Py + IIA+1I2IIP - EPII2 

d 



ca\\EP-P\\2 



|A^ 



\P-EP\\2, 



where ca is a constant depending only on A. By the weak law of large numbers, ||P— EP||2 < ek 
with probability tending to 1 as A; —t- 00 for any e > 0. Noting that EP = E[|P{^|]1 completes 
the proof. ■ 

Proof of Theorem^ We begin with some observations and definitions: 
• First, recall that our model for randomly distributed packets ([T]) implies that 



E 



ies 



^l 'i) 1 ^) foj. all nonempty S CV. 

1- (1-g)" 



(23) 
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• With this in mind, there exists a constant c„ > such that 



E [IPill > (1 + c,)E 



n^ 



ies 



for all S CV, \S\ > 2. 



(24) 



Next, Lemma [T| implies the existence of a constant 5q > such that for any S C V with 

2 < \S\ < n - 1: 



n-\S\ (l-g)l^l 



n-1 ~ (l-g)-(l-g)" 
• The weak law of large numbers implies that 

min{(55, cj 



1 -■>)", J E[in...i^i] ,j 

^"9 TtrriDcii ^ "9- 



E[|Pf 



1 + 



4 



E 






> 






(25) 



(26) 



with probability approaching 1 as fc — > oo. 

Finally, for the proof below, we will take the number of communication rounds sufficiently 

large to satisfy 

2d 2n{l + Cg) 



r > max 



n6g ' dcq 



(27) 



Fix e > 0. Lemma [2] guarantees that there exists an optimal solution x* to LP ([3]) satisfying 



X 



d 



E[|pri]i 



< tk 



(28) 



with probability tending to 1 in k. Now, it is always possible to construct a transmission schedule 
{6^} which satisfies ^ . 6^ = [x*] and \_\.x*\ < b^i < \}x*~\ for each i,j. Observe that J2i j H < 
n + Xli^i- Thus, proving that {6]} G lZr{T) with high probability will prove the theorem. 

Since the network is rf-regular, |i9(5')| > d whenever 15*1 < n — d and |i9(5'i)| > n — \S2\ 
whenever 15*21 > n — d and Si C 82- We consider the cases where 2 < \Sr\ < n — d and 
n — d< \Sr\ <n—l separately. The case where \Sr\ = 1 coincides precisely with the constraints 
(|4]), and hence is satisfied by definition of {hi}. 
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Considering the case where 2 < \Sr\ < n — d, ^q have the following string of inequalities: 



j=i i&spr{Sj-i) 



(r+l-j) 



aE E 

i=i Je5|nr(Sj_i) 



(29) 

r 

-) ^ ^ 

>JE E ^*-^^ (30) 

j=i ies|nr(5,_i) 

^^E E 'd^^™-l E ^E[|Pri]-nfce-nr (32) 
>^E[|P{=|] f^|9(S',_i)|-nj -nr{ke + l) (33) 



> 



> 



1+Cg 

rd 

1 + c, 
rd 



E 



E 



G5r 



^ |9(S'j_i)| - n ) - nr(A;e + 1) (34) 



ieSr 



>(i + |)e^ 




>(i + :i)E 





{rd — n) — nr{ke + 1) 
— nr{ke + 1) 



> 



n^' 



iGSr 



(35) 
(36) 
(37) 
(38) 



The above string of inequalities holds with probability tending to 1 as A; — )• oo. They can be 
justified as follows: 

• (|29]) follows by definition of {6^}. 

. ^ follows since [lx*\ > \x* - 1 and \S] n r(S'j_i)| < n. 

. (|3T]) follows from writing Vf^^^S] n r(S'j_i) as (Uj^=i9(S'j_i)) \ {Sr n ^§) and expanding 
the sum. 



(32) follows from (28). 



( |33] ) is true since l^g fl 5^1 <n. 



(34) follows from (24) 
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( [35] ) follows from \d{Sj^i)\ > dhy d regularity and the assumption that 2 < 15*^1 < n — d. 



( [36| ) follows from our choice of r given in p7] ). 

( [37] ) follows since ^E [IflieSr -^i'^O — '^^(^^ + 1) "^i^h high probability for e sufficiently 

small. 



(38) follows from (26) 



Next, consider the case where n — d < \Sr\ < n — 1. Starting from ( [33] ), we obtain: 



6r-^-)>^E[|pri] Ei^(vo 



> ^E[|Pi"|] ( r{n - \Sr\) - n ) - nr{ke + 1) 



= n\Pi\] 

> E[|Pf I] 

> n\Pi\] 

> n\Pi\] 

> f]p^ 



n — \Sr 

72 — ISr 



rd 
n 



n — 1 rd 

E[|Pf|] 



nr{ke + 1) 
nr{ke + 1) 






nr{ke + 1) 



'EOa^s.^l] , s, 



E 



E[|Pf|] 



n^^ 



+ 



— nr{ke + 1) 



ieSr 



(39) 

(40) 
(41) 

(42) 

(43) 
(44) 
(45) 
(46) 



The above string of inequalities holds with probability tending to 1 as A; — )■ oo. They can be 
justified as follows: 



• ( |39| ) is simply ( [33| ) repeated for convenience. 

• ( [40] ) follows since n — d < \Sr\ < n — 1 and hence li-regularity implies that \d{Sj^i)\ > 
{n-\Sr\). 

• (|42l) follows since d < n — 1. 



(43) follows from (25). 



(44) follows from from our definition of r given in d27 



( [45] ) follows since ^E[Pf] > nr{ke + 1) with high probability for e sufficiently small. 
( |46l ) follows from ([26]). 
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Thus, we conclude that, for e sufficiently small, the transmission schedule {6]} satisfies each 
of the inequalities defining TZriT) with probability tending to 1. Since the number of such 
inequalities is finite, an application of the union bound completes the proof that {6]} G lZr{T) 
with probability tending to 1 as A; — )■ oo. ■ 



D. Divisible Packets 

Proof of Theorem |5]- Fix any e > and let x* be an optimal solution to LP (|5]). Put 
hi = X* + t. Note that hi is nonnegative. This follows by considering the set 5'\{2} in the 
inequality constraint ([6]), which implies x* > 0. 

Now, take an integer r > e^^nmaxi<i<„ hi. If packets are t-divisible, we can find a transmis- 
sion schedule {6^} such that ^hi <hl <^hi + j for all i E[n],j E [r]. 

Thus, for any (5*0, ■ ■ ■ , 5*^) G S'^'''\Q) we have the following string of inequalities: 






^(r'+l-.) > 



i=i iesinrcsj-i) 

^E E -: + ;Ei3(s,-i)i-; E "- 



> 



> 



n ^; 



1 "" 

ieSr 



'j-i 



j=i 



n 



J65gn5r 



e max hi 

r l<i<n 



Hence, Theorem 111 implies that the transmission schedule {6]} is sufficient to achieve universal 
recovery. Noting that 



n n 

EmsE'-. + t^E-^+^G 

i,j i=l i=l 



completes the proof of the theorem. 
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E. Secrecy Generation 

In this subsection, we prove Theorems |6] and |7] We again remark that our proofs can be seen 
as special cases of those in [[T4| which have been adapted for the problem at hand. For notational 
convenience, define P = {pi, . . . ,pk}. We will require the following lemma. 

Lemma 3: Given a packet distribution Pi,...,P„, let i^ be a secret-key achievable with 
communication F. Then the following holds: 



Xj. 



for some vector x = (xi, 



minimize 



H{K\Y) = H{P) -Y, 

i=l 

Xn) which is feasible for the following ILP: 

n 



(47) 



Xi 



i=l 



subject to: VJ Xi > 



ies 



n^' 



ieS'^ 



for each nonempty S C V. 



(48) 



(49) 



Moreover, if i^ is a PK (with respect to a set D) and each node i G D transmits its respective 
set of packets Pi, then 



for some vector x 



[Xl, 



minimize 



H{K\F) = H{P\Pn) - Yl ^i- 

i£V\D 

, Xn) which is feasible for the ILP: 

i€V\D 



(50) 



subject to: yj Xj > 



ies 



f]Pt 



ie5<= 



for each nonempty S C V\D. 



(51) 



(52) 



Remark 2: We remark that ( [49| ) and ([52]) are necessary and sufficient conditions for achieving 
universal recovery in the networks T and Td considered in Theorems |6] and |7} respectively. 



Thus, the optimal values of ILPs ( [48] ) and ( pT] ) are equal to M*{T) and M*{Td), respectively. 
Proof: We assume throughout that all entropies are with respect to the base-|F| logarithm 
(i.e., information is measured in packets). For this and the following proofs, let F = (Fi, . . . , F„) 
and Ffij] = (Fi, . . . ,Fj), where Fj denotes the transmissions made by node i. For simplicity, 
our proof does not take into account interactive communication, but can be modified to do so. 



Allowing interactive communication does not change the results. See [14| for details. 
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Since K and F are functions of P: 



H{P) = H{¥,K,P^,...,Pn) 

n n 

= Y, H{F,\F[^,_,]) + H{K\F) + J2 H{P^\F, K, P[i,_i]). 



(53) 
(54) 

i=l i=l 

Set Xi = if (Fj|F[i j_i]) + if (Pj|F, K, P[i,i_i]). Then, the substituting Xi into the above equation 
yields: 



ii(ir|F) = ii(p)-5;^x,. 



(55) 



«=i 



To show that x = (xi, . . . , x„) is a feasible vector for ILP ( [48] ), we write: 



n^' 



je5= 



H{Ps\Ps 



H(F,K,Ps\Ps^ 



(56) 
(57) 



= J2 H{F,\F[^,^^],Psc) + H{K\F, Psc) + J^ ^iP.lF, K, P[i,i-i], P5=n[,.+i,n]) (58) 

i=l ieS 

< Y, H{F,\F^^,_^]) + ^ H{P^\F, K, P^^,_^) (59) 

= 5^x,. (60) 

In the above inequality, we used the fact that conditioning reduces entropy, the fact that ii' is a 
function of (F, Psc) for any S y^V, and the fact that Pj is a function of Pj (by the assumption 
that communication is not interactive). 

To prove the second part of the lemma, we can assume D = {1, . . . ,£}. The assumption that 
each node i in i^ transmits all of the packets in P^ implies Fi = P^. Thus, for i E D we have 
Xi = ii(Pj|P[i j_i]). Repeating the above argument, we obtain 



ii(ir|F) = HiP) - H{Pj,) - 5^ X, 

iev\D 

= H{P\Pd) - Y ^- 

i£V\D 

completing the proof of the lemma. 



(61) 
(62) 
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Proof of Theorem^ Converse Part. Suppose K h di. secret-key achievable with communi- 
cation F. Then, by definition of a SK and Lemma |3] we have 

n 

CsK = H{K) = H{K\F) = H{P) -J^"^^^ ^(^) " ^^C^) = ^ " M*{r). (63) 

Achievability Part. By definition, universal recovery can be achieved with M*{T) transmis- 
sions. Moreover, the communication F can be generated as a linear function of P (see the proof 
of Theorem[T]and [ 18 1). Denote this linear transformation by F = CP. Note that C only depends 



on the indices of the packets available to each node, not the values of the packets themselves 
(see [18]). Let Vy = {P' '■ CP' = ¥} be the set of all packet distributions which generate F. 

By our assumption that the packets are i.i.d. uniform from F, each P' G Vp is equally likely 
given F was observed. Since F has dimension M*(T), \Vf\ = W'^^'^^*^'^. Thus, we can set 
/C = F'^"*^**^^) and label each P' E V-p with a unique element in /C. The label for the actual P 
(which is reconstructed by all nodes after observing F) is the secret-key. Thus, Csk > k—M*{T). 

We remark that this labeling can be done efficiently by an appropriate linear transformation 
mapping P to K. ■ 

Proof of Theorem^ Converse Part. Suppose K is a private-key. Then, by definition of a 
PK and Lemma [3} 

CpK = H{K) = H{K\F) = H{P\Pd) - J] x, 

i£V\D 

< H{P\Pd) - M*{Td) = {k- \Pd\) - M*{Td). 

Achievability Part. Let each node i E D transmit P^ so that we can update Pj ^— Pj U Pd 
for each j E V\D. Now, consider the universal recovery problem for only the nodes in V\D. 
M*(Td) is the minimum number of transmissions required among the nodes in V\D so that 
each node in V\D recovers P. At this point, the achievability proof proceeds identically to the 
SK case. ■ 

VL Concluding Remarks 

In this paper, we derive necessary and sufficient conditions for achieving universal recovery 
in an arbitrarily connected network. For the case when the network is fully connected, we 
provide an efficient algorithm based on submodular optimization which efficiently solves the 
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cooperative problem. This algorithm and its derivation yield tight concentration results for the 
case when packets are randomly distributed. Moreover, concentration results are provided when 
the network is rf-regular and packets are distributed randomly. If packets are divisible, we prove 
that the traditional cut-set bounds are achievable. As a consequence of this and the concentration 
results, we show that splitting packets does not typically provide a significant benefit when the 
network is rf-regular. Finally, we discuss an application to secrecy generation in the presence 
of an eavesdropper. We demonstrate that our submodular algorithm can be used to generate the 
maximum amount of secrecy in an efficient manner. 

It is conceivable that the coded cooperative data exchange problem can be solved (or approx- 
imated) in polynomial time if the network is c/-regular, but packets aren't necessarily randomly 
distributed. This is one possible direction for future work. 
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Appendix A 

An Efficiently Solvable Integer Linear Program 

In this appendix, we introduce a special ILP and provide an efficient algorithm for solving it. 
This algorithm can be used to efficiently solve the cooperative data exchange problem when the 
underlying graph is fully-connected. We begin by introducing some notatiorQ 

Let E = {1, . . . ,n} he a finite set with n elements. We denote the family of all subsets of E 
by 2^. We frequently use the compact notation E\U and U + i to denote the sets E nW^ and 
U U {i} respectively. For a vector x = (xi, . . . ,x„) G M", define the corresponding functional 
X : 2^ ^ M as: 

x{U) :=^a;i, for U C E. (64) 

''We attempt to keep the notation generic in order to emphasize that the results in this appendix are not restricted to the 
context of the cooperative data exchange problem. 
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Throughout this section, we let T = 2^ — {0, £"} denote the family of nonempty proper 
subsets of E. Let B = {Bi, . . . , i?„}. No special structure is assumed for the Bi's except that 
they are finite. 

With the above notation established, we consider the following Integer Linear Program (ILP) 
in this section: 



minimize < y^WiXi : x{U) > 



ieE 



n B. 

i(^E\U 



, V f/ e J^, Xi G Z > . (65) 



It is clear that any algorithm that efficiently solves this ILP also solves ILP ( [13] ) by putting 

Bi ^ P[ and w = 1. 

A. Submodular Optimization 



Our algorithm for solving ILP (65) relies heavily on submodular function optimization. To 



this end, we give a very brief introduction to submodular functions here. 
A function (7 : 2^ — )• M is said to be submodular if, for all X, F G 2^, 

g{X) + g{Y) > g{X HY) + g{X U Y). (66) 

Over the past three decades, submodular function optimization has received a significant amount 
of attention. Notably, several polynomial time algorithms have been developed for solving the 
Submodular Function Minimization (SFM) problem 

mm{g{U):UCE}. (67) 



We refer the reader to |25|-[27| for a comprehensive overview of SFM and known algorithms. 



As we will demonstrate, we can solve ILP ( [65] ) via an algorithm that iteratively calls a SFM 



routine. The most notable feature of SFM algorithms is their ability to solve problems with 
exponentially many constraints in polynomial time. One of the key drawbacks of SFM is that 
the problem formulation is very specific. Namely, SFM routines typically require the function g 
to be submodular on all subsets of the set E. 

B. The Algorithm 



We begin by developing an algorithm to solve an equality constrained version of ILP (65) 



We will remark on the general case at the conclusion of this section. To this end, let M be a 
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positive integer and consider the following ILP: 



minimize w x 



subject to: x{U) > 



n B. 

ieE\u 



for all U G J^, and 



x{E) = M. 



(68) 



(69) 



(70) 



Remark 3: We assume Wi > 0, else in the case without the equality constraint we could allow 
the corresponding Xi — > +oo and the problem is unbounded from below. 



Algorithm A.l: SolveILF(B, E,M,w) 
comment: Define / : 2^ — > M as in equation ( fTT] ). 

X <r- ComputePotentialX(/, M, w) 
if CheckFeasible(/, x) 

then return (x) 

else return (Problem Infeasible) 



Theorem 8: Algorithm A.l solves the equality constrained ILP ( [68] ) in polynomial time. If 
feasible, Algorithm [A.l returns an optimal x. If infeasible, Algorithm [A.l returns "Problem 
Infeasible". 

Proof: The proof is accomplished in three steps: 

1) First, we show that if our algorithm returns an x, it is feasible. 

2) Second, we prove that if a returned x is feasible, it is also optimal. 

3) Finally, we show that if our algorithm does not return an x, then the problem is infeasible. 
Each step is given its own subsection. ■ 
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Algorithm |A. 1| relies on three basic subroutines given below: 



Algorithm A.2: ComputePotentialX(/, M, w) 
comment: If feasible, returns x satisfying ( |69l ) and ( fTO] ) that minimizes w'^x. 



comment: Order elements of E so that wi > W2 > ■ ■ ■ > Wn- 

for i <— n to 2 

I comment: Define fi{U) := f{U + i) for [/ C {i, . . . , n}. 

x,^SFM{fi,{i,...,n}) 

Xi^ M - YTi=2 ^i 

return (x) 



do <' 
[ 



Algorithm A.3: CheckFeasible(/, x) 




comment: Check if x(f/) 


< f{U) for aWU eJ^ with 1 


eU. 


comment: Define fi{U) : 


= /(f/+l) forU CE. 




if SFM(/i,E) <0 






then return ( false ) 






else return ( true ) 







Algorithm A.4: SFM(/, V) 



comment: Minimize submodular function / over groundset V. See [25| for details. 

v^mm{f{U) -.U CV} 
return (v) 



C. Feasibility of a Returned x 



In this section, we prove that if Algorithm A.l returns a vector x, it must be feasible. We 
begin with some definitions. 

Definition 6: A pair of sets X,Y C E is called crossing if X fl F 7^ and X UY j^ E. 
Definition 7: A function g : 2^ ^R is crossing submodular if 

giX)+giY)>g{XnY)+giXUY) 



37 



for X, Y crossing. 

We remark that minimization of crossing submodular functions is well established, however 
it involves a lengthy reduction to a standard submodular optimization problem. However, the 
crossing family J^ admits a straightforward algorithm, which is what we provide in Algorithm 



A.l We refer the reader to [27] for complete details on the general case. 
For M a positive integer, define 



f(U) := M 






x{U), for f/ G -F. 



(71) 



Lemma 4: The function / is crossing submodular on J^. 
Proof: For X,Y G J-' crossing: 



/(X) + f{Y) = M 



M 



i&X 



> M 



r\B. 

X 

n « 



f]B. 



eY 



x{X) + M 

x{X r\Y) + M - 

- x{X r\Y) + M 



x(Y) 



f]B. 



ieY 



x{X U Y) 



iexnY 
f{XnY) + f{XUY). 



n B^ 



iexuY 



x{X U Y) 



Observe that, with / defined as above, the constraints of ILP (68) can be equivalently written 



as: 



/([/) = M - 






- x{U) > for all U E J^, and 



(72) 



x{E) = M. (73) 

Without loss of generality, assume the elements of E are ordered lexicographically so that 



Wi > W2 > ■ ■ ■ > Wn- At iteration i in Algorithm A. 2, Xj = for all j < i. Thus, setting 



Xi ^ min {fi{U)} 

UC{i,...,n\ 

= min {f{U)} 

UC{i,...,n}:ieU 



mm < M — 

UC{i,...,n}:i£U 






-x{U) 



(74) 

(75) 

(76) 
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and noting that the returned x satisfies x{E) = M, rearranging (|76]) guarantees that 



x{E\U) > 



r\B. 

ieu 



, for all U C {i,. .. ,n},i eU 



(77) 



as desired. Iterating through i E {2, . . . ,n} guarantees fTT] ) holds for 2 < i < n. 

Remark 4: In the feasibility check routine (Algorithm A.3| ), we must be able to evaluate 
fi{E). The reader can verify that putting f{E) = preserves submodularity. 

Now, in order for the feasibility check to return true, we must have 



min{/i(f/)}= min {f(U)} 



min < M — 






-x(JJ) 



>0, 



implying that 



x{E\U) > 



Ob, 

ieu 



for allU dEAe U. 



(78) 
(79) 
(80) 

(81) 



Combining fTTJ ) and ( [81] ) and noting that x{E) = M proves that x is indeed feasible. Moreover, 
X is integral as desired. 



D. Optimality of a Returned x 



In this section, we prove that if Algorithm A.l returns a feasible x, then it is also optimal. 
First, we require two more definitions and a lemma. 



Definition 8: A constraint of the form f72| ) corresponding to U is said to be tight for U if 



/([/) = M 



f]B. 

ieu 



x{U) = 0. 



(82) 



Lemma 5: If x is feasible, X, Y are crossing, and their corresponding constraints are tight, 
then the constraints corresponding to X DY and X UY are also tight. 

Proof: Since the constraints corresponding to X and Y are tight, we have 



= fix) + /(r) > fix n F) + fix u f) > o. 



(83) 



The first inequality is due to submodularity and the last inequality holds since x is feasible. This 
implies the result. ■ 
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Definition 9: A family of sets C is laminar if X, F G £ implies either X fl F = 0, X C F, 
or r C X. 



At iteration k {1 < k < n) of Algorithm A. 2, let Uk be the set where |76l ) achieves its 
minimum. Note that k EUk ^ {A;, • • • lU]. By construction, the constraint corresponding to Uk 
is tight. Also, the constraint x{E) = M is tight. From the t/^'s and E we can construct a laminar 
family as follows: if Uj CiUk ^ ^ for j < k, then replace Uj with Uj ^ UkU Uj. By Lemma |5J 
the constraints corresponding to the sets in the newly constructed laminar family are tight. Call 
this family C For each i E E, there is a unique smallest set in C containing i. Denote this set 
Li. Since k E Uk ^ {k, . . . , n}, Li ^ Lj for i ^ j. Note that Li = E and Li C Lj only if j < i. 

For each Li E C there is a unique smallest set Lj such that Lj C Lj. We call Lj the least 
upper bound on Lj. 



Now, consider the dual linear program to (68): 



maximize 






tveM 



subject to: V^ njj + tte + Wi = 0, for 1 < i < n 
TTf/ > for f/ G -F, and tie free. 



(84) 
(85) 
(86) 



For each Lj G C, let the corresponding dual variable tt^. = w^ — Wi, where Lj is the least 
upper bound on Lj. By construction, tcl. > since it was assumed that wi > ■ ■ ■ > Wn- Finally, 
let tte = —wi and nu = ior U ^ C 

Now, observe that: 

J2 7Tu + TrE + Wi = (87) 

as desired for each i. Thus, n is dual feasible. Finally, note that vrjy > only if U E C However, 
the primal constraints corresponding to the sets in C are tight. Thus, (x, vr) form a primal-dual 
feasible pair satisfying complementary slackness conditions, and are therefore optimal. 
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E. No Returned x = Infeasibility 



Finally, we prove that if the feasibility check returns false, then ILP ( [681 ) is infeasible. Note 
by construction that the vector x passed to the feasibility check satisfies 



M- 






x{U) > for all nonempty t/ C {2, . . . , n}, 



(88) 



and x{E) = M. Again, let Uk be the set where ( |76l ) achieves its minimum and let C be the laminar 
family generated by these f/^'s and E exactly as before. Again, the constraints corresponding 
to the sets in C are tight (this can be verified in a manner identical to the proof of Lemma [5]). 
Now, since x failed the feasibilty check, there exists some exceptional set T with 1 G T for 
which 



M - 



ieT 



x{T) < 0. 



(89) 



Generate a set Lt as follows: Initialize Lt ^— T. For each L, E C,Li ^^ E, if Lt f] Li ^ 0, 
update Lt ^ LtU Li. Now, we can add Lt to family C while preserving the laminar property. 
We pause to make two observations: 

1) By an argument similar to the proof of Lemma [5} we have that 

x{Lt) < 0. 



M 






2) The sets in C whose least upper bound is E form a partition of E. We note that Lt is a 
nonempty class of this partition. Call this partition Vc- 
Again consider the dual constraints, however, let Wi = (this does not affect feasibility). For 
each L E Vc define the associated dual variable ttl = a, and let tce = —a. All other dual 
variables are set to zero. It is easy to check that this vr is dual feasible. Now, the dual objective 
function becomes: 









tteM 




-a M 



-a\ M- 



— )■ +00 as a — 7- oo. 



x{L) + x{L) + aM (90) 

-ax{E) + aM (91) 

(92) 
(93) 
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Thus, the dual is unbounded and therefore the primal problem must be infeasible. 
As an immediate corollary we obtain the following: 

Corollary 1: The optimal values of the ILP: 

min {x{E) : x{U) > \ n,eE\uBi \,U e T,Xi eZ} 
and the corresponding LP relaxation: 

min {x{E) : x{U) > | Hi^^EXuBi \,U E J',Xi eR] 
differ by less than 1 . 



Proof: Algorithm A.l is guaranteed to return an optimal x if the intersection of the polytope 
and the hyperplane x{E) = M is nonempty. Thus, if M* is the minimum such M, then the 
optimal value of the LP must be greater than M* — L ■ 

E Solving the General ILP 

Finally, we remark on how to solve the general case of the ILP without the equality constraint 



given in ( [65] ). First, we state a simple convexity result. 



Lemma 6: Let p^(M) denote the optimal value of ILP ( [68] ) when the equality constraint is 
x{E) = M. We claim that p^(M) is a convex function of M. 

Proof: Let Mi and Ma be integers and let 6 E [0, 1] be such that Me = OMi + (1 - 6)M2 is 
an integer. Let x*^^^ and be x^'^^ optimal vectors that attain p^(Mi) and ^^(Ma) respectively. Let 
x^^^ = 9x^^^ + (1 — 9)x^'^\ By convexity, x^^^ is feasible, though not necessarily integer. However, 
by the results from above, optimality is always attained by an integral vector. Thus, it follows 
that: 

epliMi) + (1 - e)pl{M2) = ew^x'^^^ + (l - e)w^x'^^^ = w^x^^^ > pl{Me). (94) 

■ 

Noting that p^(M) is convex in M, we can perform bisection on M to solve the ILP in the 
general case. For our purposes, it suffices to have relatively loose upper and lower bounds on 
M since the complexity only grows logarithmically in the difference. A simple lower bound on 
M is given by M > maxj \Bi\. 
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G. Complexity 

Our aim in this paper is not to give a detailed complexity analysis of our algorithm. This is 
due to the fact that the complexity is dominated by the the SFM over the set E in Algorithm 



A.3 Therefore, the complexity of Algorithm A.l is essentially the same as the complexity of 



the SFM solver employed. 

However, we have performed a series of numerical experiments to demonstrate that Algorithm 



A.l| performs quite well in practice. In our implementation, we ran the Fujishige-Wolfe (FW) 



algorithm for SFM [1281 based largely on a Matlab routine by A. Krause [29]. While the FW 
algorithm has not been proven to run in polynomial time, it has been shown to work quite well 



in practice [28| (similar to the Simplex algorithm for solving Linear Programs). Whether or not 
FW has worst-case polynomial complexity is an open problem to date. We remark that there are 
several SFM algorithms that run in strongly polynomial time which could be used if a particular 
application requires polynomially bounded worst-case complexity [l25|. 

In our series of experiments, we chose Bi C F randomly, where |F| = 50. We let n = \E\ 
range from 10 to 190 in increments of 10. For each value of n, we ran 10 experiments. The 
average computation time is shown in Figure |6} with error bars indicating one standard deviation. 
We consistently observed that the computations run in approximately 0(n^ '^^) time. Due to 
the iterative nature of the SFM algorithm, we anticipate that the computation time could be 
significantly reduced by implementing the algorithm in C/C-i-i- instead of Matlab. However, the 
0{n^'^^) trend should remain the same. Regardless, we are able to solve the ILP problems under 
consideration with an astonishing 2^^° constraints in approximately one minute. 

Appendix B 
A Linear Programming Approximation Lemma 

Lemma 7: Let A E M"^" be a symmetric matrix with nonnegative entries and all column sums 
equal to d. Let Xy be the vector of minimum Euclidean norm which minimizes \\Axy — y\\2. 
There exists an optimal solution x* to the linear program 

minimize l^x (95) 

subject to: Ax y y 
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Fig. 6. Experimental results. For the red dotted line, the multiplicative constant a and exponent /? were chosen to minimize 
the MSB X]r=i I log(o'T- ) ^ log(m„)p, where rhn is the sample mean of the computation times for \E\ = n. 



which satisfies 



\x 



y lloo 



< c^llAa;^^ -yh, 



where ca is a constant depending only on A. 

Proof of Lemma ^ To begin the proof, we make a few definitions. Let A be the absolute 
value of the nonzero eigenvalue of A with smallest modulus (at least one exists since d is 
an eigenvalue). Define A/'(A) to be the nuUspace of A, and let ^^{A) denote its orthogonal 
complement. Finally, let A+ denote the Moore-Penrose pseudoinverse of A. 

Fix Xy e M", and note that x* is an optimal solution to LP ( |95] ) if and only if x' 



Xy IS an 
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optimal solution to the linear program 

minimize l^(x + Xy) 
subject to: A{x + Xy) y y 

with variable a; G M**. With this in mind, put Xy = A^y and define b = y — Axy. By definition of 
the pseudoinverse, Xy is the vector of minimum Euclidean norm which minimizes \\Axy — y\\2- 
Moreover, b G Af{A). 

Thus, in order to prove the lemma, it suffices to show the existence of an optimal solution x* 
to the linear program 

minimize l^x (96) 

subject to: Ax >: b 
which also satisfies the additional constraints 

kil < c^ll&lh for i = l,...,n, (97) 

where ca is a constant depending only on A. 



Claim 1: There exists an optimal solution x* to Linear Program (96) which satisfies 



X* < {dX)-'^n\\b\\^ fori = l,...,n. (98) 



The proof relies heavily on duality. The reader is directed to 1 30 1 or any other standard text for 
details. 

To prove the claim, consider LP (|96]). By premultiplying the inequality constraint by d~^l'^ 



on both sides, we see that I'^x > d ^l^b > — oo. Hence, the objective is bounded from below, 
which implies that strong duality holds. Thus, let z be an optimal solution to the dual LP of 



(96): 



maximize b'^z (99) 

subject to: Az = 1 

with dual variable 5; G M*^. 
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Next, consider the dual LP of ([96]) with the additional inequality constraints corresponding to 



(98): 



maximize b^z - {dX)~^n\\b\\oot^y (100) 

subject to: Az = 1 + y 
zhO 

yhO 

with dual variables 2; G M" and y G M". Equivalently, by setting z = z + Az and observing that 
y = AAz, we can write the dual LP ( |100[ ) as 

maximize b'^z + b^Az - {dXy^nWbW^t^AAz (101) 

subject to: AAz y 

z + AzhO 

with dual variables Az G M". We prove the claim by showing that the dual LPs ( [99] ) and ( |101| ) 
have the same optimal value. Since strong duality holds, the corresponding primal problems 
must also have the same optimal value. 

Without loss of generality, we can uniquely decompose Az = Azi + Az2 where Azi G A/'(A) 
and Az2 G ^-^{A). Since b G M{A), we have b^ Az2 = and we can rewrite ( |101| ) yet again 
as 

maximize b'^z + b^ Azi - {d\)-^n\\b\\^t^ AAz2 (102) 

subject to: AAz2 >: 

z + Azi + Az2 h (103) 

AzieAf{A),Az2eAf^{A). 

By definition of A, for any unit vector u G J\f-^{A) with ||n||2 = 1 we have ||v4n||2 > A. Using 
this and the fact that AAz2 >i for all feasible Az2, we have the following inequality: 

l^AAz2 = \\AAz2\\i > \\AAz2\\2 > AIIA^aL- 



Thus, the objective ( |102[ ) can be upper bounded as follows: 

b^z + b^Azi - {d\)-^n\\b\\oot^ AAz2 < b^z + b^ Azi - d-^n\ 



\AZ2\ 



(104) 
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Next, we obtain an upper bound on b'^Azi. To this end, observe that constraint ( |103| ) implies 
that z + Azi y — l||A2;2||oo- Motivated by this, consider the following e-perturbed LP: 



minimize 



b'v 



(105) 



subject to: z + v >: —el 

veAf{A). 

with variable v. Let p*{e) denote the optimal value of the e-perturbed problem. First observe 
that p*{0) = 0. To see this, note that if z + v y 0, then b^v < 0, else we would contradict the 



optimality of z since z = z + v is a feasible solution to the dual LP ( [991 ) iii this case. Now, weak 
duality implies 

-b^v>p*{e)>p*{0)-el^w*, (106) 



where w* corresponds to an optimal solution to the dual LP of the unperturbed primal LP ( |105[ ), 
given by: 

maximize — z^{Aw — b) 
subject to: Aw y b. 



(107) 



Hence, ( |106| ) implies that 



b^Azi < \\Az2\\ool^W* 



(108) 



if Azi, Az2 are feasible for LP ( |102[ ). 

By definition of z, z^ A = 1^, and hence a vector w* is optimal for ( |107[ ) if and only if it 
also optimizes: 

minimize I'^w 
subject to: Aw y b. 



Combining this with ( |108| ), we have 

b'^Az, < ||A;Z2||ool^W^* < \\Az2\\o.t^w 

for any vector w satisfying Aw y b. Trivially, w = d^^\\b\\oot satisfies this, and hence we obtain: 

b'^Azi < (i"-^r2||6||oo||A2;2||oo- 
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Finally, we substitute this into ( |104| ) and see that 

h^z < h^z + d'^n\\h\\^\\/\z2\\oo - d-'^n\\h\\^\\/\z2\\2 
< h^z + rf"^n||6||oo||A-22||2 - ci"^n||6||oo||A2;2||2 
<}Fz 



for all vectors z which are feasible for the dual LP ( |100[ ). This completes the proof of Claim [T] 



Claim 2: There exists an optimal solution x* to Linear Program (96) which satisfies 



\xi\ <ca||6||2 fori = l,...,n (109) 

for some constant ca depending only on A. 

First note that ||6||oo < ||^||2 for any h E W^, hence it suffices to prove the claim for the 
infinity norm. Claim [T] shows that each of the Xj's can be upper bounded by ((iA)^^n||6||oo 
without affecting the optimal value of LP ([96]). To see the lower bound, let a J be a row of A 
with entry a^j > d/n in the i^'^ coordinate (at least one exists for each i since the columns of A 
sum to d). Now, the inequality constraint Ax y b combined with the upper bound on each Xi 
implies: 

ttjiXi + {d- aji)\''^n\\b\\oo > ajx > bj > -||&||oo- (HO) 

Since a^j > d/n, ( |110[ ) implies: 

Xi > —X^^n{n — 1] 



Hence, we can take ca = \ ^n x max{n — l,d ^}. This proves Claim [Ij and, by our earlier 
remarks, proves the lemma. ■ 
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